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Abstract 

Mappings between related ontologies are increasingly used to support 
^Nj data integration and analysis tasks. Changes in the ontologies also require 

T-H the adaptation of ontology mappings. So far the evolution of ontology 

mappings has received little attention albeit ontologies change continu- 
ously especially in the life sciences. We therefore analyze how mappings 
I""! between popular life science ontologies evolve for different match algo- 

rithms. We also evaluate which semantic ontology changes primarily affect 
^ the mappings. We further investigate alternatives to predict or estimate 

O the degree of future mapping changes based on previous ontology and 

mapping transitions. 
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^ 1 Introduction 

CS| Ontologies have become increasingly important in the life sciences [H [TH] . They 

are used to semantically annotate molecular-biological objects such as proteins 
or pathways [57] • Different ontologies of the same domain often contain over- 
04 lapping and related information. For instance, information about mammalian 

anatomy can be found in NCI Thesaurus [19] and Adult Mouse Anatomy [1]. 
^ Ontology mappings are used to express the semantic relationships between dif- 

ferent but related ontologies, e.g., by linking equivalent concepts of two ontolo- 
gies. 

Mappings between related ontologies are useful in many ways, in particular 
for data integration and enhanced analysis |21|, I15j . In particular, such map- 
pings are needed to merge ontologies, e.g., to create an integrated cross-species 
anatomy ontology such as the Uber ontology [5S| . Anatomy ontology mappings 
may also be useful to transfer knowledge from different experiments between 
species [3]- Furthermore, mappings can help finding objects with similar on- 
tological properties as interesting targets for a comparative analysis. Ontology 
curators can further find missing ontology annotations and get recommendations 
for possible ontology enhancements based on mappings to other ontologies. 

Ontologies underly continuous modifications so that new ontology versions 
are released periodically [13] . New versions typically incorporate enhanced 



1 



knowledge, such as additional concepts, relationships, and attribute values. Ex- 
isting information can also be revised or even deleted. Such ontology changes can 
invalidate previously determined ontology mappings so that they may have to be 
re-determined to remain useful. Unfortunately, determining ontology mappings 
is an expensive process even with the help of semi-automatic ontology match- 
ing techniques [71 [51] that still involve a manual verification of correspondences 
and a parametrization effort. The importance on determining and adapting on- 
tology mappings is underlined by the popular Ontology Alignment Evaluation 
Initiative (OAEI) [22]. OAEI provides real- world test data sets, in particular 
for matching the Adult Mouse Anatomy Ontology against the anatomy part of 
NCI Thesaurus. Unfortunately, the reference mapping of the anatomy task is 
based on 5 year old ontology version^ so that its quality for the current ontology 
versions remains unclear. 

The evolution of ontology mappings has received very little attention so far, 
especially for the life science domain. For example it is unknown to what de- 
gree and how mappings between popular life science ontologies change and how 
ontology changes affect ontology mappings. There are many ways to compute 
mappings and it is not clear to what degree different match methods result 
in differently stable ontology mappings. Finally, we would like to investigate 
to what degree one can predict future mapping changes based on previously 
observed ontology and mapping changes. Such information is expected to be 
useful for deciding about whether a previous ontology mapping is still reliable 
and up-to-date or whether one has to perform an expensive adaptation of the 
mapping. 

To address these questions and issues we make the following contributions: 

• We introduce a generic model for ontology and mapping evolution as well 
as for their inter-dependencies. The model supports analyzing the impact 
of ontology evolution on mapping evolution, e.g., what ontology changes 
lead to the addition or deletion of correspondences in the mapping. (Sec. [3]) 

• We apply our model to three life science scenarios and evaluate how map- 
pings between popular life science ontologies evolve. We also investigate 
mapping evolution for different match techniques. (Sec. |4]) 

• We propose and evaluate two approaches to estimate the number of map- 
ping changes based on previous ontology and mapping changes. (Sec. [s]) 

In Sec. [2] we present preliminaries and outline the general scenario. We 
describe related work in Sec. [6] and conclude in Sec. [TJ 

2 Preliminaries 

2.1 Ontology, Mapping, and Matching 

In general an ontology O — (C, R, A) consists of concepts C which are interre- 
lated by directed relationships R. Each concept has an unambiguous identifier 
such as an accession number. A concept typically has further attributes a £ Ato 
describe the concept, e.g., name, synonyms, or definition. A relationship r £ R 

^As of 2012, the current reference ontology mapping has been created in 2007. 
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Figure 1: General evolution scheme with multiple ontology and mapping ver- 
sions 

forms a directed connection between two concepts and has a specific type, e.g., 
is_a or part_of. An ontology mapping M01.02 is a set of correspondences 
(cl, c2) whereby each correspondence interconnects two concepts cl G Ol and 
c2 e 02 of the two ontologies. The mapping semantics depends on the intended 
use case but we assume that all correspondences of a mapping express the same 
semantic type, e.g., is-equivalent-to or is-related-to. 

Since a purely manual creation of ontology mappings is a tedious and labor- 
intensive task such mappings are usually determined by semi-automatic ontol- 
ogy matching techniques (see Sec. |6] for Related Work). Most matching ap- 
proaches are metadata-based, i.e., they use the ontology representations them- 
selves to find related concepts, in particular the names of concepts and con- 
textual information like the names of the parent or child concepts within the 
ontologies. In our evaluation, we will analyze mapping changes for three typical 
metadata-based matchers (Sec.|4|. 

2.2 Versioning Scheme 

We define an ontology version 0„ = {Cy, Rv,Ay) as a snapshot of an ontology 
O released at a specific point in time. For simplicity we enumerate the versions 
with ascending numbers u = 1, 2, . . . rather than using the actual release dates. 

Ontology changes affect previously determined ontology mappings so that 
these mappings should be continuously adapted. Fig. [l] illustrates the general 
versioning scheme we adopt in this paper. There is a series of versions {v — 
1 . . .k) for a pair of ontologies Ol and 02 that are connected by an ontology 
mapping A/qi 02- For simplicity we determine ontology mappings only between 
ontologies of the same version number, i.e., we create mappings My only between 
ontology versions Oly and 02y referring to the same specific point in time. 

The difference between two ontology and mapping versions is denoted by 
dif f{Oy,Oy+i) and mdif f{My,My+i), respectively. The next section explains 
dif f and mdiff in more detail. 
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Change operation 


Tvpe 


Insertion of a new concept to Ou+i 
Insertion of a subgraph to a concept 
Insertion of new relationship in O^+i 
Addition of an attribute (to an existing concept) 
Mark concept as non-obsolete 


Information extension 


Deletion of a concept in 0^ 
Removal of a subgraph 
Deletion of an relationship in 
Deletion of an existing attribute 
Mark concept as obsolete 


Information reduction 


Split concept of Oy into multiple concepts in Ou+i 
Merge concepts of Oy into a single concept in 0„+i 
Concept substitution 
Move concept 
Change attribute value 


Information revision 



Table 1: COntoDiff change operations (including their categorization in three 
groups) for ontology evolution Oy 0„+i. 



3 Change Model for Ontologies and Mappings 

We first describe our change model for ontologies and mappings and categorize 
the changes into different groups. We also propose simple change ratio indica- 
tors to assess the evolution intensity between successive ontology and mapping 
versions. We then propose indicators to assess the impact of ontology changes 
on ontology mappings. 

3.1 Ontology Changes 

We start by defining what changes can occur between successive ontology ver- 
sions Oy and Oy+i. Our model is based on the COntoDiff algorithm described 
in [12]. COntoDiff computes the difference diff{Ov,Ov+i) between an old and 
a new version of an ontology and consists of the set of change operations that 
- when applied to Oy - transform the old into the new version. Basic change 
operations are concept and attribute additions or deletions. COntoDiff also de- 
termines more complex changes such as merging or splitting of concepts or the 
addition/deletion of subgraphs. 

Table [l] lists all considered change operations and additionally categorizes 
them into one of three groups. The first group contains information extending 
operations that add information in Oy such as new concepts, relationships or 
attribute values. The second group, information reduction, includes change op- 
erations that remove information from Oy. All other operations including split 
and merge changes belong to the revise group. 

For a quantitative change analysis we assign concepts both from Oy and 
Oy+i based on their change operations to one of the following sets: 
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Figure 2: left: Example evolution of two ontologies and a mapping. Concepts 
bi and 62 have been revised, ^2 G 02 has been removed, and gi, /i, and /2 
have been added during the evolution from version v — 1 t-^ 2. The map- 
ping change between Ol and 02 comprises two new correspondences ((61,62), 
(/17/2)) and two removed correspondences ((61, ci), (di,d2)). right: Impact 
matrix of ontology and mapping changes. 



• Extension set: Ext{Ov^y+i) = set of concepts in 0„ U Oy+i where all 
concept-related change operations are information extending. 

• Reduction set: Red{Ov^v+i) — set of concepts in Oy U O^+i where all 
concept-related change operations are information reducing. 

• Revision set: Rev{Ov^v+i) = set of concepts in 0„ U 0„+i that are 
involved in at least one change operation but belong neither to Ext nor 
to Red. Each concept is thus related to a revise operation or is related to 
both extending and reducing operations. 

All other concepts remain unchanged, i.e., they are not affected by any 
change operation. Fig. [2] illustrates an evolution example for two ontologies 01 
and 02. For example, the evolution from 02i to 022 might contain three change 
operations: insertion of concept /2, deletion of concept c?2, and an attribute 
value change for concept 62. The three concepts are thus assigned to Ext, 
Red, and Rev, respectively, i.e., Ext{02i^2) — {/2}, Red{02i^2) = {^^2}, and 
Rev{02i^2) — {62}- AH other concepts of Fig. [2]are not affected by the change 
operations. 

The size of the three concept sets Ext, Red, and Rev quantitatively charac- 
terizes the degree of change during the evolution from 0„ to Ou+i. We therefore 
define the ontology change ratio as follows: 

\Ext{Oy^y+i)U Red{Oy^y+i)U Rev{Oy^y+i)\ 

oci?(o„^,„+0 iauo„+i| 

The ontology change ratio for 02 of our running example (Fig. [2]) is thus 
OCR{02,^2) = \{.f2,d2, e2}\/\{a2, 62, C2, ^2, 62, Ml = 0.5. 

3.2 Mapping Changes 

For ontology mapping evolution we employ a simple model that distinguishes 
between the addition and deletion of correspondences. Thus, between two con- 
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secutive mapping versions and M^+i we consider whether a new correspon- 
dence has been added {Add) or a previous one has been removed {Del). We 
group changed correspondences into the following sets: 

• Addition set: Add{M^^y^i) = My^i\M^ 

• Deletion set: Del{My^y+i) — My\My^i 

All other correspondences appear in both mapping versions and are thus 
unchanged. Based on the introduced sets we define the mapping change 
ratio as follows: 

\Add{My^y+i) U £'e/(M^^„+i)| 



MCR{A'U 



^-^v+l) 



\MyUMy + l\ 

In the example of Fig.[2]there are two new correspondences, i.e., Add{Mi^2) = 
{(^1) ^2)) (/i) /2)}- and two deleted correspondences, (61,02) and (^1,^2). Since 
there is one unchanged correspondence (01,02), the mapping change ratio 
MCR{Mi^2) equals 4/5. 



3.3 Impact of Ontology on Mapping Changes 

To determine how ontology changes influence or trigger mapping changes it 
is useful to interrelate the different kinds of ontology changes and mapping 
changes. For this purpose, we interrelate the three sets of changed concepts 
{Ext, Red, Rev) with the two sets of changed correspondences {Add, Del). 
We will define six corresponding indicators and use them for both analyzing 
mapping evolution (see Sec. |4]) as well as for predicting mapping changes for 
new ontology versions (see Sec.|5|. 

The impact ratio is the share of changed concepts that actually had an 
impact on the correspondences. For any set of ontology changes Och {Ext, 
Red, or Rev) and mapping changes Mch {Add or Del) it is defined as follows: 

jj,.r> M ^ |{c € Och\^c' : {e,c') £ Mgh V (c',c) € Mch}\ 

iR{UchiMch) = tf; — I 

\Och\ 

For example, to determine which fraction of additive ontology changes led to 
new correspondences we determine the impact ratio for Och = Ext{0\i^2) U 
Ext{02i^2) and Men — Add{Mi^2)- For the example in Fig. [2j two (/i and 
/2) out of the three i^xt-concepts appear in the set of added correspondences, 
i.e., the changes in these two concepts had an impact on the mapping. Therefore 
IR{Ext,Add) equals |. 

One would expect that Ext concepts mostly lead to correspondence additions 
whereas Red concepts usually account for correspondence deletions. However, 
as we will see in our evaluation (see Sec. |4|, Ext concepts may also trigger 
correspondence deletions and Red concepts may lead to new correspondences 
depending on the match technique. 



4 Analysis of Mapping Evolution 

After introducing the experimental setup, we analyze ontology and mapping 
evolution for different life science scenarios. We then compare mapping evolution 
for different match strategies and evaluate the impact of ontology changes on 
mapping changes. 
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Figure 3: Ontology and mapping growth factors. Number of concepts 
(1(^2006-061) and number of mapping correspondences (IM2006-06I) in the first 
considered version. |C| is the sum of domain and range ontology size for each 
match problem. Growth factors compare the first (2006-06) and last (2010-12) 
considered version. 

4.1 Setup 

We consider three mapping scenarios: 

• Anatomy: map Adult Mouse Anatomy Ontology (MA) to the anatomy 
part of NCI Thesaurus (NCITa) 

• Molecular Biology: map the two Gene Ontology [SI sub-ontologies Molec- 
ular Functions (MF) and Biological Processes (BP) 

• Chemistry: map Chemical Entities of Biological Interest (ChEBI) 5 to 
NCI Thesaurus (NCIT) 

For each input ontology we map 10 versions on a half year basis between 2006-06 
and 2010-12 with each other. We use the following meta-data based matchers 
to compute the confidence (similarity) for any concept pair of two ontologies: 

• Name: String (trigram) similarity of concept names 

• NameSyn: Maximal string (trigram) similarity of names and synonyms 

• Context: String (trigram) similarity of the concatenated parent, concept, 
and children names 

In this study we focus on the evolution of ontology mappings and do not 
evaluate the quality of matching. The choice of match strategies is based on 
previous studies where matching on concept names and synonyms achieved high 
quality especially for anatomy ontologies [lOl [TT| . To obtain precise results we 
need to select the most likely correspondences exceeding a certain confidence 
threshold. We applied a default confidence threshold of 0.6] for the NameSyn 
matcher, we also considered a stricter threshold of 0.8. Moreover, for each input 
ontology concept, we only select the top correspondences in a small delta range 
(MaxDelta selection [6]). 

4.2 Ontology and Mapping Evolution 

Fig. [3] gives an overview about the ontology and mapping sizes as well as their 
growth between June 2006 and Dec. 2010. For Anatomy, the combined size of 
concepts in domain and range ontology (|C|) grew only slightly by a factor 1.1 
to almost 10,000 concepts. By contrast, |C| increased by 60 - 70 % to 30,000 
and 120,000 concepts for Molecular Biology and Chemistry. In two of the three 
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scenarios {Anatomy and Molecular Biology), the mappings grow similarly strong 
as the ontologies while the Chemistry mappings grew by up to a factor 6. The 
especially high mapping growth for the Context matcher seems influenced by 
its very small mapping size which in turn is caused by its need to find similar 
names not only for the concepts but also for their parent and child concepts. 
Comparing the results for NameSyn with two different thresholds, we find that 
a higher threshold produces smaller mappings and achieves only a relatively 
small coverage, especially for Molecular Biology. For Molecular Biology, the 
Name matcher proved to determine the most stable mappings. 

Fig.|4]Ja) shows ontology change factors (see Sec 3.3 1 between succeeding ver- 
sions for the three domains during the 5-year observation period. For Anatomy 
there were only few changes over time compared to the other two domains. 
Molecular Biology shows high change rates until 2007 (nearly 40%). From 2008 
on, change rates are comparable to those of Chemistry (around 20%). Fig.|4|^b) 
illustrates more detailed mapping evolution results for NameSyn 0.6 in Molec- 
ular Biology. In general, correspondence additions dominate leading to a final 
mapping size of more than 2,500 correspondences. But there has also been a 
considerable number of deletions. In 2007-12 nearly 500 correspondences were 
removed from the mapping. This shows that there can be very heavy mapping 
changes. 



4.3 Comparison of Match Strategies 

To analyze the mapping stability for different match strategies in more detail, we 
examine a possible correlation between ontology and mapping changes over time. 
We therefore compute ontology and mapping change factors for all three match 
scenarios and the four match strategies (Fig. [5] a-c) . For Anatomy, ontologies 
and mappings only slightly changed (see y-axis range), while the other two 
scenarios experience a surprisingly high degree of mapping changes between 10 
and 80 %. Except for Chemistry we observe a strong correlation between the 
ontology change factor (black continuous line) and the mapping change factors 
of the different match strategies(colored dashed lines). The Name matcher 
was relatively stable in general while the Context matcher was most heavily 
influenced by ontology evolution. This especially holds for Chemistry where 80% 
of the Context mappings changed in 2008. The reason for the relative instability 
of Context is mainly in its use of more ontological information that can change, 
i.e., changes on both parent and child concepts have an influence. For instance. 
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Figure 4: (a) Ontology change factors, (b) Mapping evolution for NameSyn 0.6 
matcher in Molecular Biology example. 
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Figure 5: Ontology and mapping change factors for three Hfe science domain 
examples (a) Anatomy, (b) Molecular Biology, (c) Chemistry 



moving a concept from one parent concept to another might completely change a 
concept's context. For Molecular Biology the mappings, (especially NameSyn) , 
changed heavily in 2007-12, although the maximum ontology evolution already 
occurred in 2007-06. This results from successive modification of GO-BP and 
GO-MF in 2007. The combined changes in both sub-ontologies seem to have 
led to numerous mapping changes in 2007-12. 

4.4 Impact of Ontology on Mapping Changes 

Fig. |6] illustrates the real impact of ontology changes {Ext, Red, Rev) on map- 
ping changes {Add, Del). We exemplarily show results for NameSyn 0.6 and 
computed the average over all versions. The table shows the number of changed 
concepts as well as the ratio having impact on mapping changes {IR)- First, 
we can observe that a high number of ontology extensions, reductions and revi- 
sions has no impact on the ontology mappings (>80%). This is due to a limited 
match coverage since changed ontology parts that are not covered by the on- 
tology mapping do not result in mapping changes. Second, extending ontology 
changes {Ext) primarily cause correspondence additions and no or only few cor- 
respondence deletions for all three scenarios. Third, Red concepts are primarily 
involved in correspondence deletions but also in some additions. The latter 
might result from specific matcher characteristics. Imagine a concept loses a 
synonym and also the correspondence based on this synonym. This can enable 
a new correspondence by relating the concept to another one than before. Thus, 
a synonym deletion can lead to a correspondence deletion and addition in one 
evolution step. Finally, revised concepts {Rev) trigger both. Add and Del. This 
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Figure 6: Impact of ontology concept changes {Ext, Red, Rev) on mapping 
changes {Add, Del) for NameSyn 0.6. Average values for absolute change num- 
ber {\Ext\, \Red\, |i?eu|) and impact association ratios {IR{Och, M^ch) displayed 
as percentage) over all considered versions 
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Table 2: Example prediction scenario. 



is intuitive since revised concepts might have been extended and reduced in 
one evolution step (e.g., attribute addition and deletion). In general, ontology 
revisions account for a high share of mapping changes while deletions play only 
a minor role. 

4.5 Summary 

We evaluated ontology and mapping evolution for three real-world life science 
domains {Anatomy, Molecular Biology and Chemistry) and took four match- 
strategies into account. The analysis results show that especially Molecular Bi- 
ology and Chemistry underlie heavy ontology extensions and revisions whereas 
Anatomy is relatively stable. Since existing knowledge is mainly extended or 
revised, we find only few ontology reducing changes for all domains. Ontology 
evolution heavily influenced mappings computed by different metadata-based 
match strategies. Especially, the structural matcher Context produced rather 
unstable results whereas mappings based on the Name matcher are relatively 
stable. As expected, ontology extensions primarily lead to correspondence addi- 
tions and information reducing ontology changes primarily lead to the removal 
of correspondences. Ontology revisions play an important role and result in 
both the addition and deletion of correspondences. 

5 Mapping Change Estimation 

We now present two methods to estimate the number of changes in a new map- 
ping version. By predicting future mapping changes we can give recommenda- 
tions to users if it might be necessary to recompute their mappings. This seems 
especially useful when one must decide about performing an expensive manual 
mapping adaption or not. We first describe the methods and then comparatively 
evaluate their quality on our mapping problems. 

5.1 Prediction Methods 

The general task of estimating mapping changes is the following. After the 
release of two new ontology versions Oik /02k we like to predict the number of 
mapping changes {\Add{Mk-it~^k\,\Del{Mk-iH^k\) which will occur between the 
mapping versions Mk-i and Af^, i.e., we like to know how strong mapping M 
is likely to change due to modifications in 01/02. For this estimation we can 
access the content of the previous h ontology/mapping versions [v—k-h,. . . ,k- 
1 ) and their diff results. In the following we describe two prediction methods, 
namely Mapping-based Estimation (ME) and Impact-based Estimation (IE). The 
synthetic example in Table [2] will be used for illustration. 
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Figure 7: Prediction analysis (a) Example for predicting the successor version 
(red dotted line) on the basis of a window of 5 predecessor versions (h=5), (b) 
Average error sum (avg(errSum)) of false predictions for h = 2 . . .5 for three 
methods MEavg, MEw'^, lEw'^ 



Mapping-based Estimation In this approach the prediction only uses infor- 
mation about previous mapping changes but not about the underlying ontology 
changes. The estimation for \Add{Mk-i^k)\ and \Del{Mk-i^k)\ is the weighted 
average of the number of changes observed in the last h-1 version changes of the 
mapping. We can use different functions w to weight the version changes: 

\AddiMk.i^k)\ ^ EtZl-h+lW^ ■ \Add{M,.i^,)\ 
\DeliMk.i^k)\ = J:':ZLh+lW^ ■ \Del{M,_i^,)\ 

For our example in Table[2]we like to make a prediction for the number of added 
correspondences between version 3 and 4 {\Add{M3^4)\) using the versions 1-3 
{h=3). We use a quadratic weighting function with the following weights for the 
two previous version changes: | and | . We would thus estimate \ Add{MQ^4)\ 
= I • 20 + I • 10 = 12 with the ME method. 

5 5 

Impact-based Estimation The idea behind impact-based estimation is to use 
knowledge about the impact of ontology on mapping changes to estimate the 
number of correspondence changes. We assume that the number of added/deleted 
correspondences can be expressed as a linear combination of the observed on- 
tology changes having an impact: 

\AddiMk.i^k)\ = 13 ■ ia.ggiIRiExt,Add)) ■ \Ext{Ok-i^k)\ 
+ a.sg{IR{Red,Add)) ■ \Red{Ok-i^k)\ 
+ a.ggiIR{Rev,Add)) ■ \Rev{Ok-i^k)\) 



\DeliMk-i^k)\ = P ■ {eigg{IR{Ext, Del)) ■ \Ext{Ok-i^k)\ 
+ a.gg{IR{Red, Del)) ■ \RediOk-i^k)\ 
+ a.gg{IR{Rev,Del)) ■ \Rev{Ok-i^k)\) 

For both formulas we need two specify two parameters. First, we need to de- 
termine the impact ratios (IR) which indicate how strong ontology changes will 
influence the mapping in the current version change. Since we consider the last 
h-1 version changes, we need to aggregate the observed impact ratios into a 
common value (agg function), e.g., by a normal or weighted average. Second, 
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we need to determine the j3 parameter which performs an error correction on 
the result. In particular, for each version change we calculate the estimated 
value using the linear combination formula with the impact ratios observed. 
We then compare the estimation with the correct result and compute an error 
ratio between both. We finally take the average of all computed error ratios as 
our /?. 

For our example we need to determine three impact ratios. We will use the 
same quadratic weighting as for ME to compute a weighted average. Thus, for 
IR{Ext, Add) we would determine a value of ^ •0.3+|-0.4 = 0.38 {IR{Red, Add) = 
0.02 and IR{Rev,Add) = 0.12). We further calculate the error ratio for each 
version change, e.g., for 1 i— >■ 2 the estimated result is 0.3 • 60 + 0.1 • 10 + 0.2 • 
15 = 22. A comparison with the correct number of correspondence additions 
{\Add{Mi^2)\ = 20) results in a ratio of § « 0.91 (2 3 : 0.77). Thus, the 
average error ratio over all version changes is /? = 0.84 resulting in an estimation 
of \Add{M3^4)\ = 0.84 • (0.38 • 40 + 0.02 • 4 + 0.12 • 12) « 14. 

5.2 Evaluation 

We now apply our two estimation methods to predict how many correspondence 
additions and deletions might occur in a future mapping. We use the same 
datasets as before (see Sec. |4|. For the map-based method (ME), we apphed 
two different weight functions: average (avg) and quadratic weighting average 
(w^). For the Impact-based method (IE), we only show results for w'^ since this 
showed to be more effective. To get an overview how accurate both methods 
are and how many versions are required for good estimation, we performed the 
following experiment. 

We predict the last five mapping versions using several numbers of predeces- 
sor versions {h = 2 . . .5). Fig.jTf^a) exemplarily shows the experimental scenario 
for /i = 5. We produce five results per h for the ME avg, ME and IE pre- 
diction methods. For each h and method we compute an error sum [errSum: 
sum of absolute differences between correct (CR) and predicted result (PR)) 
over all prediction results for three matchers (Name 0.6, NameSyn 0.6, Context 
0.6) and all match scenarios. To better compare the methods and to study the 
influence of h we compute average error sums which are displayed in Fig.mb). 




For h = 2, ME avg and ME produce the same results since they only consider 
one mapping version diff. For a higher number of predecessor versions (h > 2) 
ME produces smaller errors. Overall IE is more effective than both ME 
methods, i.e., using information about ontology evolution as well, seems to be 
more informative and thus leads to more accurate results. Especially, only con- 
sidering the recent past (small h) suffices to make a good estimation with our 
impact-based method IE. 

To get an impression how many change operations we predict for ME and 
IE w^, we selected the following case. Considering the change factors in Figjs] 
we would expect that it is hard to predict version 2009-06 based on 2008-06 and 
2008-12 for all three match scenarios. In particular, for Anatomy and Chemistry 
we see a strong decrease in their change factors whereas for Molecular Biology 
an increase occurred. Fig. [8] shows detailed results of the prediction case. The 
error rate err gives the absolute difference of PR and CR divided by the re- 
spective mapping size for the predicted version (|M2oo9-06|)- To get an better 
overview we illustrate err on a red green scale. Overall both methods produce 
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Figure 8: Number of correct and estimated AddCorr and DelCorr operations 
using mapping versions M2oo8-06-^2008-i2 to predict changes in M2oo9-06- 
Comparison of two methods [MEw'^, lEw'^), for three Hfe science domains 
and the three matchers. CR (PR) - number of correct (predicted) result, err - 
error rate on a red (high err) green (small err) scale 

relatively good results (green err values) for correspondence deletions. By con- 
trast, estimating additions seems more complicated. IE produces only small 
errors for additions whereas ME either estimates too high (for Anatomy and 
Chemistry) or too low (for Molecular Biology) values (yellow to red err values). 
This is triggered by the previous trend of mapping evolution, as we have seen 
in Figjsj Thus, if the pattern of mapping evolution suddenly changes, methods 
making an estimation solely on the basis of previous mapping changes fail. 

By contrast, lEw^ involves knowledge about ontology evolution as well as its 
impact on mapping evolution which leads to more accurate prediction results. 
Especially considering the overall mapping sizes, the predicted results (PR) 
for Anatomy are very close to the correct results (CR) (e.g., 8-13, 13-14, ti- 
ll for correspondences additions). In general, it seems very difficult to predict 
mapping changes for Chemistry and the context 0.6 matcher. For Chemistry one 
and the same ontology change factor can lead to mapping changes of different 
magnitude so that change prediction becomes a complex task. For context 0.6, 
there are several different influences as the evolution of the concept itself, its 
parents and its children, making it difficult to correctly predict mapping changes. 

In general we can recommend that the OAEI Anatomy mapping is still 
feasible and reliable as there were relatively few ontology changes since 2007. 
Thus, we would expect only few mapping adaptations. By contrast, knowledge 
in the Molecular Biology or Chemistry domains changed dramatically in the last 

5 years. Thus, mapping adaptation is strongly recommended to obtain useful 
mappings. 

6 Related Work 

In the last decade, ontology matching to semi-automatically create ontology 
mappings has become an active research field (see [7J [53] for overviews). In 
the life sciences especially the matching of anatomy ontologies [SU] and molec- 
ular biological ontologies [2 has attracted considerable interest. Most match 
approaches focus on improving the quality of computed mappings by applying 
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different matchers (e.g., based on the name/synonyms of concepts, the ontology 
structure or associated instances) in a workflow-hke manner. For comparing 
available match systems w.r.t. their quality the OAEI [22] provides gold stan- 
dard mappings, e.g., between MA and NCIT. 

Previous work on ontology evolution (see [HI [H] for surveys) focused on on- 
tology versioning [T7] , the evolution process itself [3S] as well as the detection of 
changes between ontology versions [10] ■ Few approaches investigate how changes 
in ontologies should be propagated to dependent artifacts such as instances or 
annotations. For example, the ontology evolution process proposed in |26j in- 
cludes a change propagation phase where performed changes are propagated to 
other ontologies that are based on the modified ontology. 

The evolution of ontology mappings has received only little attention so far. 
In our previous work |13j we studied the evolution of ontologies, annotations 
and ontology mappings. We analyzed mapping evolution for one match problem 
and noticed dramatic increases in the number of correspondences especially for 
instance-based matchers. In a further study [25] we focused on the stability of 
correspondences created by an instance-based matcher and proposed measures 
which allow for a classification of (un)stable correspondences. 

In contrast to previous work this study focuses on the impact of ontology 
on mapping changes, i.e., we investigate (1) how ontology mappings change and 
(2) study how ontology changes correlate with mapping changes for different 
matchers. Furthermore, we use the knowledge from the correlation between 
ontology and mapping changes to estimate the cardinality of future mapping 
changes. The mapping versions under investigation were created with previ- 
ously evaluated matchers such as name or name/synonym using the GOMMA 
system [16]. 

7 Conclusion and Future Work 

We studied the evolution of ontology mappings and analyzed the ontology 
changes triggering mapping changes as well as the influence of different match 
techniques. Our analysis covered three life science mappings and three match 
strategies. Furthermore we proposed two prediction methods for estimating the 
cardinality of future mapping changes. Except for anatomy ontologies, we ob- 
served that ontology mappings based on common match strategies using name 
and synonym information often experience heavy changes. Our prediction meth- 
ods were quite effective and could reasonably estimate the number of correspon- 
dence additions and removals in a new mapping version. In future work, we plan 
to investigate how known ontology changes can be used to semi-automatically 
adapt ontology mappings without a completely new mapping determination. 
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